<<<<<<< HEAD Analysis of Covariate Results

Analysis of Covariate Results

Author

Lauren Khoury

Published

July 22, 2025

Warning: package 'skimr' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'cowplot' was built under R version 4.4.3
Warning: package 'kableExtra' was built under R version 4.4.3

Background

We ran simulations to generate data, fit linear models using different selection methods for covariates, and extract the results. In each simulation, we generated data with a dichotomous \(X\). We generated the covariates and \(Y\) from a multivariate normal distribution with \(\mu = 0\) and a covariance matrix based on the correlation values listed below.

We manipulated the following variables:

Variable Description Values
b_x The population parameter for \(X\) 0, 0.3, 0.5
n_obs Number of observations in a sample 50, 100, 150, 200, 300, 400
n_covs Number of total available covariates 4, 8, 12, 16, 20
p_good_covs Proportion of “good” covariates* 0.25, 0.50, 0.75
r_ycov Correlation between \(Y\) and the “good” covariates 0.3, 0.5
r_cov Correlation between the “good” covariates* 0.3

* Note: here we define “good” covariates as ones that have a nonzero relationship with \(Y\)


We fully crossed all levels, yielding 450 unique research settings.
We used the following 7 methods to select covariates to include in a linear model:

  1. No covariates
  2. All covariates
  3. P-hacking
  4. Bivariate correlation
  5. Partial correlation (including \(X\) in the model)
  6. Full lm (including \(X\) in the model)
  7. Full lm without \(X\) in the model
  8. LASSO (including \(X\) in the model)
  9. LASSO without \(X\) in the model

We fit a linear model for each method and from the model output, extracted the estimate for \(X\), standard error of this estimate, and p-value of this \(X\) effect.

We repeated this 20,000 times for each research setting.

We present the results here.

Data Analysis

Preview data

The total amount of data simulated equals 97,200,000 observations = 540 unique research settings \(\times\) 20,000 simulations each \(\times\) 9 methods. We will consider the data separated by the \(X\) effect.

Data for \(b_x = 0\), as an example, is shown below.

Rows: 16,800,000
Columns: 16
$ job_num       <int> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10~
$ method        <chr> "no_covs", "all_covs", "p_hacked", "r", "partial_r", "fu~
$ simulation_id <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,~
$ estimate      <dbl> 0.2158, 0.1872, 0.2218, 0.1812, 0.1812, 0.1812, 0.1872, ~
$ SE            <dbl> 0.157, 0.151, 0.157, 0.151, 0.151, 0.151, 0.151, 0.156, ~
$ p_value       <dbl> 0.171, 0.218, 0.159, 0.234, 0.234, 0.234, 0.217, 0.515, ~
$ ndf           <int> 1, 5, 3, 2, 2, 2, 4, 1, 5, 4, 2, 2, 2, 3, 1, 5, 1, 2, 2,~
$ ddf           <int> 148, 144, 146, 147, 147, 147, 145, 148, 144, 145, 147, 1~
$ covs_tpr      <dbl> 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,~
$ covs_fpr      <dbl> 0.000, 1.000, 0.667, 0.000, 0.000, 0.000, 0.667, 0.000, ~
$ n_obs         <int> 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 1~
$ b_x           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ n_covs        <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,~
$ r_ycov        <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0~
$ p_good_covs   <dbl> 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.~
$ r_cov         <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0~

Zero \(X\) Effect

First, we look at the zero \(X\) effect condition to compare the Type I errors across methods and research settings.

Type I error

We will look at the overall Type I error across methods, then will compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.

by method

We will first consider the Type I error by the selection method. Here we calculate the proportion of significant effects (\(p < 0.05\)), the Type I error, displayed below as a bar plot.

Figure 1

From this bar plot, we see that the p-hacking method for selecting covariates leads to inflated Type I error rates. The no covariates, all covariates, and bivariate correlation (“r”) approaches are all at the expected 0.05 mark (\(\alpha = 0.05\)), while the partial correlation (“partial r”) approach shows slight inflation, with full lm and lasso showing further inflation of Type I error.

distributions

Here we view the distributions of the Type I error rate by method, beginning by isolating the p-hacked method.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 2

We see that while the average Type I error for the p-hacked method was 0.198, it reached as high as 0.462, further emphasizing the inflation of error.

Removing this method, we can view the distributions of the remaining 6 methods.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 3

We see that no covariates, all covariates, and r selection methods show normal distributions centered around 0.05, while partial r, full lm, and lasso are slightly right-skewed, with full lm having the greatest skew.

by n_obs

We will view the Type I error rates of each method for the different levels of the number of observations in a sample. In the table below, we see the minimum, maximum, and average Type I error for each level of n_obs by each method.

Table 1: Type I error by n_obs
n_obs method typeI_min typeI_max typeI_mean
50 no_covs 0.048 0.053 0.050
50 all_covs 0.048 0.052 0.050
50 p_hacked 0.084 0.426 0.215
50 r 0.047 0.052 0.050
50 partial_r 0.050 0.072 0.057
50 full_lm 0.054 0.096 0.074
50 full_lm_wo_x 0.048 0.052 0.050
50 lasso 0.052 0.089 0.069
50 lasso_wo_x 0.048 0.052 0.050
100 no_covs 0.048 0.052 0.050
100 all_covs 0.048 0.053 0.050
100 p_hacked 0.078 0.449 0.203
100 r 0.048 0.053 0.050
100 partial_r 0.050 0.059 0.052
100 full_lm 0.051 0.087 0.064
100 full_lm_wo_x 0.047 0.054 0.050
100 lasso 0.051 0.067 0.058
100 lasso_wo_x 0.048 0.052 0.050
150 no_covs 0.047 0.053 0.050
150 all_covs 0.047 0.052 0.050
150 p_hacked 0.076 0.454 0.198
150 r 0.048 0.051 0.050
150 partial_r 0.049 0.054 0.051
150 full_lm 0.049 0.081 0.059
150 full_lm_wo_x 0.047 0.052 0.050
150 lasso 0.049 0.062 0.055
150 lasso_wo_x 0.047 0.052 0.050
200 no_covs 0.048 0.052 0.050
200 all_covs 0.048 0.053 0.050
200 p_hacked 0.074 0.456 0.194
200 r 0.047 0.053 0.050
200 partial_r 0.048 0.054 0.051
200 full_lm 0.048 0.071 0.056
200 full_lm_wo_x 0.048 0.052 0.050
200 lasso 0.049 0.059 0.054
200 lasso_wo_x 0.048 0.052 0.050
300 no_covs 0.047 0.053 0.050
300 all_covs 0.048 0.053 0.050
300 p_hacked 0.073 0.455 0.189
300 r 0.048 0.053 0.050
300 partial_r 0.048 0.053 0.051
300 full_lm 0.049 0.061 0.054
300 full_lm_wo_x 0.049 0.052 0.050
300 lasso 0.049 0.056 0.052
300 lasso_wo_x 0.048 0.052 0.050
400 no_covs 0.048 0.053 0.050
400 all_covs 0.047 0.052 0.050
400 p_hacked 0.071 0.462 0.186
400 r 0.047 0.052 0.050
400 partial_r 0.047 0.053 0.051
400 full_lm 0.047 0.058 0.053
400 full_lm_wo_x 0.047 0.052 0.050
400 lasso 0.047 0.055 0.052
400 lasso_wo_x 0.047 0.052 0.050

Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different sample sizes. However, for p-hacked, full lm, and lasso, the Type I error does decrease as sample size increases. This can be visualized in the plot below.

Note: For the line plots throughout this report, a solid line will indicate a method that does not involve performing selection of covariates (i.e., using no or all covariates). A dashed line will indicate a method that does involve performing a non-trivial selection of covariates. A dotted black line will indicate the expected value (if applicable). For example, here we see a dotted line at \(\alpha = 0.05\), the expected Type I error rate.

Figure 4

Again, we see the inflation of Type I error for p-hacking. For smaller sample sizes, we see that lasso and full lm perform worse than the other methods, but the methods become comparable as sample size increases.

by n_covs

We will view the Type I error rates of each method for the different number of available covariates (Note: this is not necessarily the number of covariates included in the model.). In the table below, we see the minimum, maximum, and average Type I error for each level of n_covs by each method.

Table 2: Type I error by n_covs
n_covs method typeI_min typeI_max typeI_mean
4 no_covs 0.047 0.052 0.050
4 all_covs 0.047 0.053 0.050
4 p_hacked 0.071 0.140 0.098
4 r 0.047 0.053 0.050
4 partial_r 0.047 0.058 0.051
4 full_lm 0.047 0.062 0.053
4 full_lm_wo_x 0.048 0.052 0.050
4 lasso 0.047 0.058 0.052
4 lasso_wo_x 0.048 0.052 0.050
8 no_covs 0.047 0.052 0.050
8 all_covs 0.048 0.053 0.050
8 p_hacked 0.094 0.238 0.150
8 r 0.047 0.052 0.050
8 partial_r 0.048 0.062 0.052
8 full_lm 0.048 0.082 0.056
8 full_lm_wo_x 0.047 0.054 0.050
8 lasso 0.049 0.067 0.054
8 lasso_wo_x 0.047 0.052 0.050
12 no_covs 0.049 0.052 0.050
12 all_covs 0.047 0.052 0.050
12 p_hacked 0.117 0.329 0.201
12 r 0.047 0.052 0.050
12 partial_r 0.047 0.064 0.052
12 full_lm 0.047 0.093 0.060
12 full_lm_wo_x 0.047 0.052 0.050
12 lasso 0.048 0.074 0.057
12 lasso_wo_x 0.047 0.052 0.050
16 no_covs 0.048 0.053 0.050
16 all_covs 0.048 0.053 0.050
16 p_hacked 0.139 0.404 0.249
16 r 0.047 0.053 0.050
16 partial_r 0.050 0.069 0.053
16 full_lm 0.051 0.094 0.064
16 full_lm_wo_x 0.047 0.052 0.050
16 lasso 0.051 0.080 0.059
16 lasso_wo_x 0.048 0.052 0.050
20 no_covs 0.048 0.051 0.050
20 all_covs 0.047 0.052 0.050
20 p_hacked 0.165 0.462 0.290
20 r 0.048 0.052 0.050
20 partial_r 0.048 0.072 0.053
20 full_lm 0.050 0.096 0.066
20 full_lm_wo_x 0.048 0.052 0.050
20 lasso 0.050 0.089 0.061
20 lasso_wo_x 0.047 0.052 0.050

Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different amounts of available covariates. However, for p-hacked, full lm, and lasso, the Type I error increases as the number of covariates increases. This can be visualized in the plot below.

Figure 5

We see the large increase in error for the p-hacked method and the slight increase for full lm and lasso methods as the number of covariates increases, while the other methods stay around 0.05.

by p_good_covs

We will view the Type I error rates of each method for the different proportions of good covariates. In the table below, we see the minimum, maximum, and average Type I error for each level of p_good_covs by each method.

Table 3: Type I error by p_good_covs
p_good_covs method typeI_min typeI_max typeI_mean
0.25 no_covs 0.048 0.053 0.050
0.25 all_covs 0.047 0.052 0.050
0.25 p_hacked 0.071 0.344 0.163
0.25 r 0.048 0.053 0.050
0.25 partial_r 0.049 0.072 0.053
0.25 full_lm 0.050 0.090 0.057
0.25 full_lm_wo_x 0.047 0.052 0.050
0.25 lasso 0.050 0.089 0.058
0.25 lasso_wo_x 0.047 0.052 0.050
0.50 no_covs 0.047 0.053 0.050
0.50 all_covs 0.047 0.053 0.050
0.50 p_hacked 0.084 0.400 0.202
0.50 r 0.047 0.052 0.050
0.50 partial_r 0.048 0.063 0.052
0.50 full_lm 0.048 0.094 0.060
0.50 full_lm_wo_x 0.047 0.054 0.050
0.50 lasso 0.049 0.084 0.057
0.50 lasso_wo_x 0.047 0.052 0.050
0.75 no_covs 0.048 0.053 0.050
0.75 all_covs 0.047 0.053 0.050
0.75 p_hacked 0.093 0.462 0.227
0.75 r 0.047 0.053 0.050
0.75 partial_r 0.047 0.059 0.051
0.75 full_lm 0.047 0.096 0.062
0.75 full_lm_wo_x 0.047 0.052 0.050
0.75 lasso 0.047 0.081 0.055
0.75 lasso_wo_x 0.048 0.052 0.050

Looking at the average column, we see that the Type I error is not affected for the no covariates, all covariates, and r approaches across different proportions. However, for p-hacked, partial r, full lm, and lasso, the Type I error changes as the number of covariates increases. This can be visualized in the plots below.

Figure 6

In this plot, we mainly see the increase in Type I error for the p-hacking approach as the proportion of good covariates increases. As there are more good covariates, the Type I error decreases for lasso and partial r, but it increases for full lm.

by correlations

Recall that in these simulations, the correlation among the good covariates was held constant at 0.3. So, we will look at the Type I error rates of each method by the correlation between \(Y\) and the good covariates.

Table 4: Type I error by y-cov correlations
r_ycov method typeI_min typeI_max typeI_mean
0.3 no_covs 0.047 0.053 0.050
0.3 all_covs 0.048 0.053 0.050
0.3 p_hacked 0.071 0.276 0.145
0.3 r 0.047 0.053 0.050
0.3 partial_r 0.048 0.072 0.053
0.3 full_lm 0.050 0.075 0.059
0.3 full_lm_wo_x 0.047 0.054 0.050
0.3 lasso 0.049 0.085 0.058
0.3 lasso_wo_x 0.048 0.052 0.050
0.5 no_covs 0.047 0.053 0.050
0.5 all_covs 0.047 0.053 0.050
0.5 p_hacked 0.082 0.462 0.250
0.5 r 0.047 0.053 0.050
0.5 partial_r 0.047 0.058 0.051
0.5 full_lm 0.047 0.096 0.060
0.5 full_lm_wo_x 0.047 0.052 0.050
0.5 lasso 0.047 0.089 0.056
0.5 lasso_wo_x 0.047 0.052 0.050

Looking at the average column, we see that the error does not change across correlations for the no covariates, all covariates, r, and partial r approaches. It changes slightly for full lm and lasso. And it changes drastically when p-hacking, such that a higher correlation among good covariates increases the Type I error.

Figure 7

In the plot, we see the Type I error for the p-hacking method increase as the correlation between \(Y\) and the good covariates increases. The other methods remain at relatively the same Type I error rate across correlation values.

Estimate, SD, & SE

Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.

Table 5: b_x = 0
method mean_estimate SD_estimate SE_mean difference
no_covs -0.000 0.178 0.178 0.000
all_covs 0.000 0.155 0.155 0.000
p_hacked 0.000 0.238 0.157 0.080
r 0.000 0.145 0.145 0.000
partial_r 0.000 0.147 0.145 0.003
full_lm 0.000 0.156 0.147 0.008
full_lm_wo_x 0.000 0.147 0.147 0.000
lasso 0.000 0.150 0.143 0.007
lasso_wo_x 0.000 0.143 0.143 0.000

We see that all methods have an average estimate of 0, as expected given the population value for \(b_x\). We see that the standard deviation of the sampling distribution of the estimate equals the standard error for no covariates, all covariates, and r selection methods. There are small differences between these values for partial r, full lm, and lasso methods. The p-hacking shows a large difference.

Sampling Distributions

Here we view a sampling distribution of the estimate for \(b_x\) for each method.

Warning: Removed 798111 rows containing non-finite outside the scale range
(`stat_density()`).
Figure 8

We see again that each method’s distribution is centered around 0, but the p-hacked method is not normally distributed as it is biasing the parameter estimates. In addition, we see that the no covariates method has more variability in its estimate for \(b_x\) than the other valid selection methods.

From these primary analyses, we see that the p-hacked method leads to inflated Type I error rates and biased parameter estimates. We can conclude that it is not a statistically valid method. For the following analyses, we will not include the p-hacked method. While partial r, full lm, and lasso selection methods showed slight inflation of Type I error, we might be willing to accept this for greater reductions in Type II error, which we will compare in the next section.

Nonzero \(X\) Effect

Next, we look at the nonzero \(X\) effect condition to compare the Type II errors across methods and research settings. Recall that we set two values for \(b_x\) of 0.3 and 0.5.

Type II Error

We will look at the overall Type II error across methods (except p-hacked), then will again compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.

by method

We will first consider the Type II error by the selection method, for both \(b_x = 0.3\) and \(b_x = 0.5\). Here we calculate the proportion of non-significant effects (\(p \geq 0.05\)), the Type II error, displayed below as bar plots.

b_x = 0.3
Figure 9

In this first plot for \(b_x = 0.3\), we see that the Type II error is highest when no covariates are used in the model. There is a large reduction in Type II error when we include all covariates compared to no covariates, and a slight further reduction in Type II error when we use a selection method for covariates compared to including all.

distributions

Here we view the distributions of the Type II error rate by method, for \(b_x = 0.3\).

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 10

Aside from the no covariates approach, we see that the other methods have a similar range of Type II error.

b_x = 0.5
Figure 11

We see a similar trend as above, of a large reduction in Type II error when including all covariates compared to no covariates, and another reduction when selecting covariates, especially when using r, partial r, or lasso.

distributions

Here we view the distributions of the Type II error rate by method, for \(b_x = 0.5\).

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 12

Similarly, we see a similar range and distribution of Type II error for the selection methods, other than no covariates method.

by n_obs

We will view the Type II error rates of each method for the different levels of the number of observations in a sample. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_obs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 6: Type II error by n_obs, b_x = 0.3
n_obs method typeII_min typeII_max typeII_mean
50 no_covs 0.815 0.824 0.819
50 all_covs 0.662 0.860 0.778
50 r 0.628 0.816 0.750
50 partial_r 0.623 0.801 0.731
50 full_lm 0.644 0.801 0.735
50 full_lm_wo_x 0.742 0.831 0.798
50 lasso 0.557 0.800 0.713
50 lasso_wo_x 0.629 0.824 0.761
100 no_covs 0.676 0.685 0.681
100 all_covs 0.296 0.684 0.549
100 r 0.269 0.665 0.528
100 partial_r 0.266 0.657 0.520
100 full_lm 0.337 0.656 0.528
100 full_lm_wo_x 0.427 0.676 0.577
100 lasso 0.256 0.656 0.513
100 lasso_wo_x 0.280 0.670 0.540
150 no_covs 0.549 0.560 0.554
150 all_covs 0.107 0.529 0.378
150 r 0.098 0.525 0.364
150 partial_r 0.097 0.521 0.360
150 full_lm 0.141 0.521 0.366
150 full_lm_wo_x 0.187 0.535 0.393
150 lasso 0.097 0.520 0.356
150 lasso_wo_x 0.098 0.531 0.371
200 no_covs 0.436 0.443 0.440
200 all_covs 0.032 0.406 0.257
200 r 0.029 0.402 0.248
200 partial_r 0.029 0.400 0.245
200 full_lm 0.045 0.399 0.250
200 full_lm_wo_x 0.066 0.406 0.266
200 lasso 0.030 0.399 0.244
200 lasso_wo_x 0.034 0.408 0.253
300 no_covs 0.260 0.268 0.264
300 all_covs 0.003 0.230 0.118
300 r 0.003 0.229 0.114
300 partial_r 0.003 0.227 0.113
300 full_lm 0.004 0.227 0.116
300 full_lm_wo_x 0.005 0.225 0.123
300 lasso 0.003 0.227 0.113
300 lasso_wo_x 0.003 0.228 0.116
400 no_covs 0.148 0.155 0.151
400 all_covs 0.000 0.121 0.053
400 r 0.000 0.119 0.051
400 partial_r 0.000 0.119 0.051
400 full_lm 0.000 0.119 0.053
400 full_lm_wo_x 0.000 0.121 0.056
400 lasso 0.000 0.119 0.051
400 lasso_wo_x 0.000 0.122 0.053

In this table, we see that for all methods, as the sample size increases, the Type II error decreases. This can be better seen in a plot below.

Figure 13

Here, we see the decrease in Type II error for the increase in sample size across all methods. We see that the no covariates approach has the highest Type II error. From both the table and the plot, we see that for smaller sample sizes, including all covariates in the model yields higher Type II errors, but these become comparable for larger sample sizes.

b_x = 0.5
Table 7: Type II error by n_obs, b_x = 0.5
n_obs method typeII_min typeII_max typeII_mean
50 no_covs 0.585 0.595 0.589
50 all_covs 0.274 0.687 0.503
50 r 0.227 0.581 0.446
50 partial_r 0.223 0.558 0.420
50 full_lm 0.318 0.558 0.454
50 full_lm_wo_x 0.423 0.608 0.542
50 lasso 0.171 0.558 0.408
50 lasso_wo_x 0.229 0.598 0.467
100 no_covs 0.296 0.308 0.303
100 all_covs 0.014 0.304 0.171
100 r 0.010 0.276 0.151
100 partial_r 0.010 0.268 0.146
100 full_lm 0.035 0.268 0.158
100 full_lm_wo_x 0.058 0.295 0.189
100 lasso 0.009 0.270 0.144
100 lasso_wo_x 0.012 0.285 0.159
150 no_covs 0.136 0.143 0.140
150 all_covs 0.000 0.119 0.056
150 r 0.000 0.115 0.050
150 partial_r 0.000 0.113 0.049
150 full_lm 0.002 0.113 0.054
150 full_lm_wo_x 0.003 0.121 0.063
150 lasso 0.000 0.114 0.049
150 lasso_wo_x 0.000 0.114 0.053
200 no_covs 0.056 0.062 0.059
200 all_covs 0.000 0.044 0.018
200 r 0.000 0.043 0.016
200 partial_r 0.000 0.042 0.016
200 full_lm 0.000 0.042 0.018
200 full_lm_wo_x 0.000 0.044 0.021
200 lasso 0.000 0.043 0.016
200 lasso_wo_x 0.000 0.044 0.017
300 no_covs 0.008 0.010 0.009
300 all_covs 0.000 0.006 0.002
300 r 0.000 0.005 0.001
300 partial_r 0.000 0.005 0.001
300 full_lm 0.000 0.005 0.002
300 full_lm_wo_x 0.000 0.005 0.002
300 lasso 0.000 0.006 0.001
300 lasso_wo_x 0.000 0.005 0.001
400 no_covs 0.001 0.002 0.001
400 all_covs 0.000 0.001 0.000
400 r 0.000 0.000 0.000
400 partial_r 0.000 0.000 0.000
400 full_lm 0.000 0.000 0.000
400 full_lm_wo_x 0.000 0.001 0.000
400 lasso 0.000 0.000 0.000
400 lasso_wo_x 0.000 0.001 0.000

We see lower overall Type II errors, but the same trend of decreasing errors for increasing sample sizes across all methods. We can view this in the plot below.

Figure 14

Similarly, the no covariates approach has the highest Type II error. For small sample sizes the all covariates approach has higher error which stabilizes with increasing sample size.

by n_covs

We will view the Type II error rates of each method for the different number of available covariates. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 8: Type II error by n_covs, b_x = 0.3
n_covs method typeII_min typeII_max typeII_mean
4 no_covs 0.148 0.821 0.485
4 all_covs 0.015 0.820 0.396
4 r 0.015 0.816 0.393
4 partial_r 0.015 0.801 0.390
4 full_lm 0.015 0.801 0.389
4 full_lm_wo_x 0.017 0.819 0.397
4 lasso 0.015 0.800 0.390
4 lasso_wo_x 0.017 0.820 0.396
8 no_covs 0.150 0.822 0.485
8 all_covs 0.004 0.826 0.361
8 r 0.003 0.816 0.354
8 partial_r 0.003 0.791 0.349
8 full_lm 0.004 0.790 0.349
8 full_lm_wo_x 0.003 0.822 0.368
8 lasso 0.004 0.788 0.348
8 lasso_wo_x 0.003 0.823 0.360
12 no_covs 0.150 0.821 0.485
12 all_covs 0.001 0.832 0.345
12 r 0.001 0.813 0.332
12 partial_r 0.001 0.783 0.326
12 full_lm 0.001 0.782 0.330
12 full_lm_wo_x 0.001 0.824 0.359
12 lasso 0.001 0.772 0.322
12 lasso_wo_x 0.001 0.823 0.340
16 no_covs 0.149 0.823 0.485
16 all_covs 0.000 0.843 0.339
16 r 0.000 0.812 0.320
16 partial_r 0.000 0.783 0.314
16 full_lm 0.000 0.779 0.321
16 full_lm_wo_x 0.000 0.826 0.359
16 lasso 0.000 0.760 0.306
16 lasso_wo_x 0.000 0.823 0.328
20 no_covs 0.149 0.824 0.485
20 all_covs 0.000 0.860 0.337
20 r 0.000 0.816 0.313
20 partial_r 0.000 0.791 0.305
20 full_lm 0.000 0.783 0.317
20 full_lm_wo_x 0.000 0.831 0.361
20 lasso 0.000 0.758 0.293
20 lasso_wo_x 0.000 0.824 0.321

From the average column, we can see that Type II error decreases as the number of covariates increases across all methods, except the no covariates method as this does not dependent on the number of covariates. We can see these trends in the plot below.

Figure 15

In this plot, as the number of covariates increases, we see larger reductions in Type II error when selecting the covariates compared to including all available covariates. Lasso performs the best with higher numbers of covariates, followed by partial r, r, and full lm in order. Again, using no covariates yields the highest Type II error.

b_x = 0.5
Table 9: Type II error by n_covs, b_x = 0.5
n_covs method typeII_min typeII_max typeII_mean
4 no_covs 0.001 0.595 0.184
4 all_covs 0.000 0.589 0.134
4 r 0.000 0.581 0.132
4 partial_r 0.000 0.558 0.128
4 full_lm 0.000 0.558 0.129
4 full_lm_wo_x 0.000 0.587 0.137
4 lasso 0.000 0.558 0.129
4 lasso_wo_x 0.000 0.588 0.134
8 no_covs 0.001 0.593 0.183
8 all_covs 0.000 0.605 0.121
8 r 0.000 0.578 0.114
8 partial_r 0.000 0.539 0.109
8 full_lm 0.000 0.546 0.114
8 full_lm_wo_x 0.000 0.591 0.130
8 lasso 0.000 0.540 0.109
8 lasso_wo_x 0.000 0.588 0.119
12 no_covs 0.001 0.594 0.184
12 all_covs 0.000 0.624 0.119
12 r 0.000 0.569 0.106
12 partial_r 0.000 0.520 0.100
12 full_lm 0.000 0.538 0.110
12 full_lm_wo_x 0.000 0.605 0.134
12 lasso 0.000 0.518 0.099
12 lasso_wo_x 0.000 0.593 0.113
16 no_covs 0.001 0.593 0.184
16 all_covs 0.000 0.656 0.122
16 r 0.000 0.576 0.102
16 partial_r 0.000 0.525 0.096
16 full_lm 0.000 0.538 0.109
16 full_lm_wo_x 0.000 0.605 0.137
16 lasso 0.000 0.513 0.092
16 lasso_wo_x 0.000 0.597 0.109
20 no_covs 0.001 0.591 0.183
20 all_covs 0.000 0.687 0.128
20 r 0.000 0.575 0.100
20 partial_r 0.000 0.536 0.094
20 full_lm 0.000 0.541 0.111
20 full_lm_wo_x 0.000 0.608 0.142
20 lasso 0.000 0.500 0.087
20 lasso_wo_x 0.000 0.598 0.107

We see the same decrease in Type II errors for increases in number of covariates. We can visualize this in the plot below.

Figure 16

Similarly, we see the selection methods performing better than using all covariates. For higher numbers of covariates, we see lasso has the lowest Type II error, followed by partial r, r, and full lm. Using no covariates yields the highest Type II error.

by p_good_covs

We will view the Type II error rates of each method for the different proportions of “good” covariates. In the table below, we see the minimum, maximum, and average Type II error for each level of p_good_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 10: Type II error by p_good_covs, b_x = 0.3
p_good_covs method typeII_min typeII_max typeII_mean
0.25 no_covs 0.148 0.824 0.485
0.25 all_covs 0.007 0.860 0.392
0.25 r 0.006 0.816 0.375
0.25 partial_r 0.006 0.801 0.366
0.25 full_lm 0.006 0.801 0.367
0.25 full_lm_wo_x 0.005 0.831 0.386
0.25 lasso 0.006 0.800 0.363
0.25 lasso_wo_x 0.006 0.824 0.385
0.50 no_covs 0.149 0.823 0.485
0.50 all_covs 0.001 0.852 0.349
0.50 r 0.001 0.814 0.336
0.50 partial_r 0.001 0.795 0.330
0.50 full_lm 0.001 0.794 0.335
0.50 full_lm_wo_x 0.001 0.826 0.363
0.50 lasso 0.001 0.792 0.326
0.50 lasso_wo_x 0.001 0.818 0.343
0.75 no_covs 0.148 0.821 0.485
0.75 all_covs 0.000 0.851 0.325
0.75 r 0.000 0.808 0.317
0.75 partial_r 0.000 0.791 0.314
0.75 full_lm 0.000 0.789 0.322
0.75 full_lm_wo_x 0.000 0.825 0.357
0.75 lasso 0.000 0.788 0.306
0.75 lasso_wo_x 0.000 0.812 0.319

From the average column, we see decreases in Type II error rates for increases in the proportion of good covariates across all methods – except no covariates, as this is independent of the proportion of good covariates. We can visualize the trends more clearly in the plot below.

Figure 17

In addition to the decrease in error mentioned above, we see that including all covariates has a higher Type II error than selection the covariates, especially for lower proportions of good covariates. For larger proportions of good covariates, we see that lasso has the lowest Type II error and that using all covariates and full lm become comparable.

b_x = 0.5
Table 11: Type II error by p_good_covs, b_x = 0.5
p_good_covs method typeII_min typeII_max typeII_mean
0.25 no_covs 0.001 0.593 0.184
0.25 all_covs 0.000 0.687 0.144
0.25 r 0.000 0.581 0.125
0.25 partial_r 0.000 0.558 0.117
0.25 full_lm 0.000 0.558 0.123
0.25 full_lm_wo_x 0.000 0.608 0.139
0.25 lasso 0.000 0.558 0.117
0.25 lasso_wo_x 0.000 0.598 0.133
0.50 no_covs 0.001 0.591 0.184
0.50 all_covs 0.000 0.674 0.121
0.50 r 0.000 0.570 0.107
0.50 partial_r 0.000 0.542 0.102
0.50 full_lm 0.000 0.546 0.112
0.50 full_lm_wo_x 0.000 0.607 0.134
0.50 lasso 0.000 0.542 0.100
0.50 lasso_wo_x 0.000 0.588 0.114
0.75 no_covs 0.001 0.595 0.184
0.75 all_covs 0.000 0.666 0.110
0.75 r 0.000 0.567 0.100
0.75 partial_r 0.000 0.537 0.097
0.75 full_lm 0.000 0.546 0.109
0.75 full_lm_wo_x 0.000 0.598 0.135
0.75 lasso 0.000 0.540 0.092
0.75 lasso_wo_x 0.000 0.569 0.102

From the table, we see decreases in Type II error for higher proportions of good covariates for methods that do include covariates. We can see the details in the plot below.

Figure 18

We again see that including all covariates has a higher Type II error rate than selecting covariates to include, although this method improves for higher proportions of good covariates. Partial r performs best for a lower proportion of good covariates while lasso performs best for a higher proportion. Similarly, full lm and all covariates approaches become comparable as the proportion increases.

by correlations

Recall that in these simulations, the correlation among the good covariates was held constant at 0.3. So, we will look at the Type II error rates of each method by the correlation between \(Y\) and the good covariates, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 12: Type II error by correlations, b_x = 0.3
r_ycov method typeII_min typeII_max typeII_mean
0.3 no_covs 0.148 0.824 0.485
0.3 all_covs 0.078 0.860 0.442
0.3 r 0.074 0.816 0.429
0.3 partial_r 0.073 0.801 0.421
0.3 full_lm 0.083 0.801 0.425
0.3 full_lm_wo_x 0.090 0.831 0.450
0.3 lasso 0.072 0.800 0.415
0.3 lasso_wo_x 0.074 0.824 0.434
0.5 no_covs 0.148 0.823 0.485
0.5 all_covs 0.000 0.790 0.270
0.5 r 0.000 0.781 0.255
0.5 partial_r 0.000 0.773 0.252
0.5 full_lm 0.000 0.772 0.258
0.5 full_lm_wo_x 0.000 0.805 0.287
0.5 lasso 0.000 0.769 0.248
0.5 lasso_wo_x 0.000 0.787 0.264

In the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases. The Type II error is highest for including no covariates. We can visualize this below.

Figure 19

In the plots, we see that the no covariates method has the highest Type II error across correlation levels. The Type II errors decrease as the correlation between \(Y\) and the good covariates increases. We also see slight decreases in Type II error from including all covariates to selecting them.

b_x = 0.5
Table 13: Type II error by correlations, b_x = 0.5
r_ycov method typeII_min typeII_max typeII_mean
0.3 no_covs 0.001 0.595 0.184
0.3 all_covs 0.000 0.687 0.170
0.3 r 0.000 0.581 0.154
0.3 partial_r 0.000 0.558 0.146
0.3 full_lm 0.000 0.558 0.153
0.3 full_lm_wo_x 0.000 0.608 0.172
0.3 lasso 0.000 0.558 0.142
0.3 lasso_wo_x 0.000 0.598 0.159
0.5 no_covs 0.001 0.594 0.184
0.5 all_covs 0.000 0.516 0.080
0.5 r 0.000 0.499 0.067
0.5 partial_r 0.000 0.488 0.065
0.5 full_lm 0.000 0.486 0.076
0.5 full_lm_wo_x 0.000 0.563 0.100
0.5 lasso 0.000 0.487 0.064
0.5 lasso_wo_x 0.000 0.516 0.074

From the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases.

Figure 20

In the plot, we see again that the no covariates approach has the highest Type II error across correlations. There is a slight decrease in Type II error when selecting covariates instead of using all covariates. Among the selection methods, full lm has the higher Type II error for a lower \(Y\)-covariate correlation, but the errors become comparable for a higher correlation.

Estimate, SD, & SE

Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.

b_x = 0.3

Table 14: Estimate, SD, SE: b_x = 0.3
method mean_estimate SD_estimate mean_SE difference
no_covs 0.300 0.178 0.178 0.000
all_covs 0.300 0.155 0.155 0.000
r 0.297 0.145 0.145 0.000
partial_r 0.300 0.147 0.145 0.003
full_lm 0.300 0.156 0.147 0.008
full_lm_wo_x 0.287 0.149 0.148 0.001
lasso 0.300 0.150 0.143 0.007
lasso_wo_x 0.292 0.144 0.143 0.000

Here we see that all methods correctly estimate \(b_x\) to be 0.3, except the r approach which yielded a slightly lower average estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.

b_x = 0.5

Table 15: Estimate, SD, SE: b_x = 0.5
method mean_estimate SD_estimate mean_SE difference
no_covs 0.500 0.178 0.178 0.000
all_covs 0.500 0.155 0.155 0.000
r 0.495 0.145 0.145 0.000
partial_r 0.500 0.147 0.145 0.003
full_lm 0.500 0.155 0.147 0.008
full_lm_wo_x 0.479 0.151 0.148 0.002
lasso 0.500 0.150 0.143 0.007
lasso_wo_x 0.487 0.144 0.143 0.001

Similarly, we see all methods correctly estimate \(b_x\) to be 0.5, except the r approach which again yielded a slightly lower estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.

Sampling Distributions

Here we view sampling distributions of the estimate for \(b_x\) for each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3

Warning: Removed 522893 rows containing non-finite outside the scale range
(`stat_density()`).
Figure 21

In the plot, we can see that the distributions for all methods are centered around 0.3. The no covariates approach again has the widest distribution.

b_x = 0.5

Warning: Removed 529644 rows containing non-finite outside the scale range
(`stat_density()`).
Figure 22

In the plot, we can see that the distributions for all methods are centered around 0.5. The no covariates approach again has the widest distribution.

From these secondary analyses, we see that using no covariates leads to the highest Type II error rates across research settings. Using all covariates provides reduction in Type II error (and thus, more power), but using a non-trivial selection method leads to further reductions in Type II error.

Good covariates true positive & false positive rates

Finally, we will look at the true positive rate (TPR) and false positive rate (FPR) for including good covariates by each method to compare how well each method found/included covariates that were part of the data generating process. Ideally, we want a high TPR, indicating that the method had a high proportion of finding/including the good covariates. We want a low FPR, indicating that the method had a low proportion of including the bad covariates. These rates are displayed in the tables below. We include the p-hacked method again.

It also makes sense to consider the TPR & FPR of including good covariates while also considering the actual number of good covariates. We will look at plots of these below as well for \(b_x = 0, 0.3, 0.5\).

True Positive Rate (TPR)

b_x = 0

Table 16: TPR, b_x = 0
method covs_tpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.496
r 0.947
partial_r 0.945
full_lm 0.641
full_lm_wo_x 0.643
lasso 0.888
lasso_wo_x 0.888

On average, we see that r, partial r, and lasso perform the best, while full lm has a considerably lower TPR.

Figure 23

As the number of good covariates increases, we see that full lm has a large decrease in the TPR and lasso has a smaller decrease. The other methods remain relatively stable. We see that the p-hacked method has a relatively flat rate of 50%, indicating that it finds only half of the good covariates.

b_x = 0.3

Table 17: TPR, b_x = 0.3
method covs_tpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.656
r 0.944
partial_r 0.945
full_lm 0.641
full_lm_wo_x 0.635
lasso 0.888
lasso_wo_x 0.884

Similarly, r, partial r, and lasso methods have a higher TPR than full lm.

Figure 24

We see a similar trend in large decreases for full lm and smaller decreases for lasso in the TPR as the number of good covariates increases.

b_x = 0.5

Table 18: TPR, b_x = 0.5
method covs_tpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.747
r 0.940
partial_r 0.945
full_lm 0.641
full_lm_wo_x 0.620
lasso 0.888
lasso_wo_x 0.877

Again we see that r, partial r, and lasso perform the best. In this case with a larger \(X\) effect, we see partial r starting to outperform the r method in terms of TPR.

Figure 25

We see the same decreases in TPR for full lm and lasso as the number of good covariates increases.

False Positive Rate (FPR)

b_x = 0

Table 19: FPR, b_x = 0
method covs_fpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.466
r 0.050
partial_r 0.050
full_lm 0.050
full_lm_wo_x 0.050
lasso 0.313
lasso_wo_x 0.310

For the FPR, we see that r, partial r, and full lm are all comparable, while lasso performs worse than these methods.

Figure 26

We see that as the number of good covariates increases, most methods remain stable in their FPR, except for lasso.

b_x = 0.3

Table 20: FPR, b_x = 0.3
method covs_fpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.415
r 0.050
partial_r 0.050
full_lm 0.050
full_lm_wo_x 0.050
lasso 0.314
lasso_wo_x 0.308

Similarly, r, partial r, and full lm are all comparable, while lasso performs worse than these methods.

Figure 27

The valid selection methods are again comparable, except for lasso which has a changing FPR as the number of good covariates increases.

b_x = 0.5

Table 21: FPR, b_x = 0.5
method covs_fpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.375
r 0.050
partial_r 0.050
full_lm 0.050
full_lm_wo_x 0.050
lasso 0.313
lasso_wo_x 0.304

Again we have that r, partial r, and full lm are all comparable, while lasso performs worse and is close to the performance of p-hacking.

Figure 28

The lasso method shows the same trend in FPR here, while the other methods remain stagnant.

Conclusions

We compared 7 methods for selecting covariates to include in linear models. In the first section looking at Type I errors, we demonstrated that the p-hacking approach is not a statistically valid method as it led to inflated Type I error rates and biased parameter estimates (Figure 8). The remaining 6 methods were all shown to be statistically valid, although partial r, lasso, and full lm showed slight inflation of Type I error (Figure 1). The methods can be further compared by their Type II error results. Overall, using no covariates performed the worst as it led to the highest Type II error (and thus, has the lowest power). The no covariates method also showed more variability in the estimate for \(b_x\) (Figure 8, Figure 21, Figure 22). Including all covariates led to reductions in Type II error, and using one of the selection methods led to further reductions in Type II error.

The Type II error of the selection methods can be looked at more closely for different research settings. For smaller sample sizes, we see the benefit of using a selection method over using all covariates (Figure 13, Figure 14). Lasso yielded slightly inflated Type I error, but it performs best for larger numbers of covariates (Figure 15, Figure 16) and for larger proportions of good covariates (Figure 17, Figure 18). This can be further seen when looking at the number of good covariates where we again see lasso outperform the other methods. However, we do see that as the number of good covariates increases, the true positive rate of lasso including these good covariates decreases compared to r and partial r (Figure 23). While the bivariate correlation method shows no inflation of Type I error, it does underestimate \(b_x\) for both \(b_x = 0.3\) (Table 14) and \(b_x = 0.5\) (Table 15). While the partial correlation method does show slight inflation of Type I error, it correctly estimates \(b_x\). For smaller correlations between \(Y\) and the covariates, we again see the benefit of selecting covariates rather than using all, especially for a larger \(X\) effect (Figure 20). Among the valid non-trivial selection methods, we see that full lm leads to the highest Type I error. When considering larger numbers of good covariates, we see the Type II error increasing for full lm, which in some cases, performed even worse than all covariates.

Overall, we can narrow the options for selection methods to bivariate correlation, partial correlation, full lm, and lasso. One of these methods may show more benefits to power depending on the research context. While they each have advantages and disadvantages, researchers should also consider the complexity of these methods when weighing the possible options for selecting covariates.

======= Analysis of Covariate Results

Analysis of Covariate Results

Author

Lauren Khoury

Published

July 22, 2025

Warning: package 'skimr' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'cowplot' was built under R version 4.4.3
Warning: package 'kableExtra' was built under R version 4.4.3

Background

We ran simulations to generate data, fit linear models using different selection methods for covariates, and extract the results. In each simulation, we generated data with a dichotomous \(X\). We generated the covariates and \(Y\) from a multivariate normal distribution with \(\mu = 0\) and a covariance matrix based on the correlation values listed below.

We manipulated the following variables:

Variable Description Values
b_x The population parameter for \(X\) 0, 0.3, 0.5
n_obs Number of observations in a sample 50, 100, 150, 200, 300, 400
n_covs Number of total available covariates 4, 8, 12, 16, 20
p_good_covs Proportion of “good” covariates* 0.25, 0.50, 0.75
r_ycov Correlation between \(Y\) and the “good” covariates 0.3, 0.5
r_cov Correlation between the “good” covariates* 0.3

* Note: here we define “good” covariates as ones that have a nonzero relationship with \(Y\)


We fully crossed all levels, yielding 450 unique research settings.
We used the following 7 methods to select covariates to include in a linear model:

  1. No covariates
  2. All covariates
  3. P-hacking
  4. Bivariate correlation
  5. Partial correlation (including \(X\) in the model)
  6. Full lm (including \(X\) in the model)
  7. Full lm without \(X\) in the model
  8. LASSO (including \(X\) in the model)
  9. LASSO without \(X\) in the model

We fit a linear model for each method and from the model output, extracted the estimate for \(X\), standard error of this estimate, and p-value of this \(X\) effect.

We repeated this 20,000 times for each research setting.

We present the results here.

Data Analysis

Preview data

The total amount of data simulated equals 97,200,000 observations = 540 unique research settings \(\times\) 20,000 simulations each \(\times\) 9 methods. We will consider the data separated by the \(X\) effect.

Data for \(b_x = 0\), as an example, is shown below.

Rows: 16,800,000
Columns: 16
$ job_num       <int> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10~
$ method        <chr> "no_covs", "all_covs", "p_hacked", "r", "partial_r", "fu~
$ simulation_id <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,~
$ estimate      <dbl> 0.2158, 0.1872, 0.2218, 0.1812, 0.1812, 0.1812, 0.1872, ~
$ SE            <dbl> 0.157, 0.151, 0.157, 0.151, 0.151, 0.151, 0.151, 0.156, ~
$ p_value       <dbl> 0.171, 0.218, 0.159, 0.234, 0.234, 0.234, 0.217, 0.515, ~
$ ndf           <int> 1, 5, 3, 2, 2, 2, 4, 1, 5, 4, 2, 2, 2, 3, 1, 5, 1, 2, 2,~
$ ddf           <int> 148, 144, 146, 147, 147, 147, 145, 148, 144, 145, 147, 1~
$ covs_tpr      <dbl> 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,~
$ covs_fpr      <dbl> 0.000, 1.000, 0.667, 0.000, 0.000, 0.000, 0.667, 0.000, ~
$ n_obs         <int> 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 1~
$ b_x           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ n_covs        <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,~
$ r_ycov        <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0~
$ p_good_covs   <dbl> 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.~
$ r_cov         <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0~

Zero \(X\) Effect

First, we look at the zero \(X\) effect condition to compare the Type I errors across methods and research settings.

Type I error

We will look at the overall Type I error across methods, then will compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.

by method

We will first consider the Type I error by the selection method. Here we calculate the proportion of significant effects (\(p < 0.05\)), the Type I error, displayed below as a bar plot.

Figure 1

From this bar plot, we see that the p-hacking method for selecting covariates leads to inflated Type I error rates. The no covariates, all covariates, and bivariate correlation (“r”) approaches are all at the expected 0.05 mark (\(\alpha = 0.05\)), while the partial correlation (“partial r”) approach shows slight inflation, with full lm and lasso showing further inflation of Type I error.

distributions

Here we view the distributions of the Type I error rate by method, beginning by isolating the p-hacked method.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 2

We see that while the average Type I error for the p-hacked method was 0.198, it reached as high as 0.462, further emphasizing the inflation of error.

Removing this method, we can view the distributions of the remaining 6 methods.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 3

We see that no covariates, all covariates, and r selection methods show normal distributions centered around 0.05, while partial r, full lm, and lasso are slightly right-skewed, with full lm having the greatest skew.

by n_obs

We will view the Type I error rates of each method for the different levels of the number of observations in a sample. In the table below, we see the minimum, maximum, and average Type I error for each level of n_obs by each method.

Table 1: Type I error by n_obs
n_obs method typeI_min typeI_max typeI_mean
50 no_covs 0.048 0.053 0.050
50 all_covs 0.048 0.052 0.050
50 p_hacked 0.084 0.426 0.215
50 r 0.047 0.052 0.050
50 partial_r 0.050 0.072 0.057
50 full_lm 0.054 0.096 0.074
50 full_lm_wo_x 0.048 0.052 0.050
50 lasso 0.052 0.089 0.069
50 lasso_wo_x 0.048 0.052 0.050
100 no_covs 0.048 0.052 0.050
100 all_covs 0.048 0.053 0.050
100 p_hacked 0.078 0.449 0.203
100 r 0.048 0.053 0.050
100 partial_r 0.050 0.059 0.052
100 full_lm 0.051 0.087 0.064
100 full_lm_wo_x 0.047 0.054 0.050
100 lasso 0.051 0.067 0.058
100 lasso_wo_x 0.048 0.052 0.050
150 no_covs 0.047 0.053 0.050
150 all_covs 0.047 0.052 0.050
150 p_hacked 0.076 0.454 0.198
150 r 0.048 0.051 0.050
150 partial_r 0.049 0.054 0.051
150 full_lm 0.049 0.081 0.059
150 full_lm_wo_x 0.047 0.052 0.050
150 lasso 0.049 0.062 0.055
150 lasso_wo_x 0.047 0.052 0.050
200 no_covs 0.048 0.052 0.050
200 all_covs 0.048 0.053 0.050
200 p_hacked 0.074 0.456 0.194
200 r 0.047 0.053 0.050
200 partial_r 0.048 0.054 0.051
200 full_lm 0.048 0.071 0.056
200 full_lm_wo_x 0.048 0.052 0.050
200 lasso 0.049 0.059 0.054
200 lasso_wo_x 0.048 0.052 0.050
300 no_covs 0.047 0.053 0.050
300 all_covs 0.048 0.053 0.050
300 p_hacked 0.073 0.455 0.189
300 r 0.048 0.053 0.050
300 partial_r 0.048 0.053 0.051
300 full_lm 0.049 0.061 0.054
300 full_lm_wo_x 0.049 0.052 0.050
300 lasso 0.049 0.056 0.052
300 lasso_wo_x 0.048 0.052 0.050
400 no_covs 0.048 0.053 0.050
400 all_covs 0.047 0.052 0.050
400 p_hacked 0.071 0.462 0.186
400 r 0.047 0.052 0.050
400 partial_r 0.047 0.053 0.051
400 full_lm 0.047 0.058 0.053
400 full_lm_wo_x 0.047 0.052 0.050
400 lasso 0.047 0.055 0.052
400 lasso_wo_x 0.047 0.052 0.050

Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different sample sizes. However, for p-hacked, full lm, and lasso, the Type I error does decrease as sample size increases. This can be visualized in the plot below.

Note: For the line plots throughout this report, a solid line will indicate a method that does not involve performing selection of covariates (i.e., using no or all covariates). A dashed line will indicate a method that does involve performing a non-trivial selection of covariates. A dotted black line will indicate the expected value (if applicable). For example, here we see a dotted line at \(\alpha = 0.05\), the expected Type I error rate.

Figure 4

Again, we see the inflation of Type I error for p-hacking. For smaller sample sizes, we see that lasso and full lm perform worse than the other methods, but the methods become comparable as sample size increases.

by n_covs

We will view the Type I error rates of each method for the different number of available covariates (Note: this is not necessarily the number of covariates included in the model.). In the table below, we see the minimum, maximum, and average Type I error for each level of n_covs by each method.

Table 2: Type I error by n_covs
n_covs method typeI_min typeI_max typeI_mean
4 no_covs 0.047 0.052 0.050
4 all_covs 0.047 0.053 0.050
4 p_hacked 0.071 0.140 0.098
4 r 0.047 0.053 0.050
4 partial_r 0.047 0.058 0.051
4 full_lm 0.047 0.062 0.053
4 full_lm_wo_x 0.048 0.052 0.050
4 lasso 0.047 0.058 0.052
4 lasso_wo_x 0.048 0.052 0.050
8 no_covs 0.047 0.052 0.050
8 all_covs 0.048 0.053 0.050
8 p_hacked 0.094 0.238 0.150
8 r 0.047 0.052 0.050
8 partial_r 0.048 0.062 0.052
8 full_lm 0.048 0.082 0.056
8 full_lm_wo_x 0.047 0.054 0.050
8 lasso 0.049 0.067 0.054
8 lasso_wo_x 0.047 0.052 0.050
12 no_covs 0.049 0.052 0.050
12 all_covs 0.047 0.052 0.050
12 p_hacked 0.117 0.329 0.201
12 r 0.047 0.052 0.050
12 partial_r 0.047 0.064 0.052
12 full_lm 0.047 0.093 0.060
12 full_lm_wo_x 0.047 0.052 0.050
12 lasso 0.048 0.074 0.057
12 lasso_wo_x 0.047 0.052 0.050
16 no_covs 0.048 0.053 0.050
16 all_covs 0.048 0.053 0.050
16 p_hacked 0.139 0.404 0.249
16 r 0.047 0.053 0.050
16 partial_r 0.050 0.069 0.053
16 full_lm 0.051 0.094 0.064
16 full_lm_wo_x 0.047 0.052 0.050
16 lasso 0.051 0.080 0.059
16 lasso_wo_x 0.048 0.052 0.050
20 no_covs 0.048 0.051 0.050
20 all_covs 0.047 0.052 0.050
20 p_hacked 0.165 0.462 0.290
20 r 0.048 0.052 0.050
20 partial_r 0.048 0.072 0.053
20 full_lm 0.050 0.096 0.066
20 full_lm_wo_x 0.048 0.052 0.050
20 lasso 0.050 0.089 0.061
20 lasso_wo_x 0.047 0.052 0.050

Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different amounts of available covariates. However, for p-hacked, full lm, and lasso, the Type I error increases as the number of covariates increases. This can be visualized in the plot below.

Figure 5

We see the large increase in error for the p-hacked method and the slight increase for full lm and lasso methods as the number of covariates increases, while the other methods stay around 0.05.

by p_good_covs

We will view the Type I error rates of each method for the different proportions of good covariates. In the table below, we see the minimum, maximum, and average Type I error for each level of p_good_covs by each method.

Table 3: Type I error by p_good_covs
p_good_covs method typeI_min typeI_max typeI_mean
0.25 no_covs 0.048 0.053 0.050
0.25 all_covs 0.047 0.052 0.050
0.25 p_hacked 0.071 0.344 0.163
0.25 r 0.048 0.053 0.050
0.25 partial_r 0.049 0.072 0.053
0.25 full_lm 0.050 0.090 0.057
0.25 full_lm_wo_x 0.047 0.052 0.050
0.25 lasso 0.050 0.089 0.058
0.25 lasso_wo_x 0.047 0.052 0.050
0.50 no_covs 0.047 0.053 0.050
0.50 all_covs 0.047 0.053 0.050
0.50 p_hacked 0.084 0.400 0.202
0.50 r 0.047 0.052 0.050
0.50 partial_r 0.048 0.063 0.052
0.50 full_lm 0.048 0.094 0.060
0.50 full_lm_wo_x 0.047 0.054 0.050
0.50 lasso 0.049 0.084 0.057
0.50 lasso_wo_x 0.047 0.052 0.050
0.75 no_covs 0.048 0.053 0.050
0.75 all_covs 0.047 0.053 0.050
0.75 p_hacked 0.093 0.462 0.227
0.75 r 0.047 0.053 0.050
0.75 partial_r 0.047 0.059 0.051
0.75 full_lm 0.047 0.096 0.062
0.75 full_lm_wo_x 0.047 0.052 0.050
0.75 lasso 0.047 0.081 0.055
0.75 lasso_wo_x 0.048 0.052 0.050

Looking at the average column, we see that the Type I error is not affected for the no covariates, all covariates, and r approaches across different proportions. However, for p-hacked, partial r, full lm, and lasso, the Type I error changes as the number of covariates increases. This can be visualized in the plots below.

Figure 6

In this plot, we mainly see the increase in Type I error for the p-hacking approach as the proportion of good covariates increases. As there are more good covariates, the Type I error decreases for lasso and partial r, but it increases for full lm.

by correlations

Recall that in these simulations, the correlation among the good covariates was held constant at 0.3. So, we will look at the Type I error rates of each method by the correlation between \(Y\) and the good covariates.

Table 4: Type I error by y-cov correlations
r_ycov method typeI_min typeI_max typeI_mean
0.3 no_covs 0.047 0.053 0.050
0.3 all_covs 0.048 0.053 0.050
0.3 p_hacked 0.071 0.276 0.145
0.3 r 0.047 0.053 0.050
0.3 partial_r 0.048 0.072 0.053
0.3 full_lm 0.050 0.075 0.059
0.3 full_lm_wo_x 0.047 0.054 0.050
0.3 lasso 0.049 0.085 0.058
0.3 lasso_wo_x 0.048 0.052 0.050
0.5 no_covs 0.047 0.053 0.050
0.5 all_covs 0.047 0.053 0.050
0.5 p_hacked 0.082 0.462 0.250
0.5 r 0.047 0.053 0.050
0.5 partial_r 0.047 0.058 0.051
0.5 full_lm 0.047 0.096 0.060
0.5 full_lm_wo_x 0.047 0.052 0.050
0.5 lasso 0.047 0.089 0.056
0.5 lasso_wo_x 0.047 0.052 0.050

Looking at the average column, we see that the error does not change across correlations for the no covariates, all covariates, r, and partial r approaches. It changes slightly for full lm and lasso. And it changes drastically when p-hacking, such that a higher correlation among good covariates increases the Type I error.

Figure 7

In the plot, we see the Type I error for the p-hacking method increase as the correlation between \(Y\) and the good covariates increases. The other methods remain at relatively the same Type I error rate across correlation values.

Estimate, SD, & SE

Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.

Table 5: b_x = 0
method mean_estimate SD_estimate SE_mean difference
no_covs -0.000 0.178 0.178 0.000
all_covs 0.000 0.155 0.155 0.000
p_hacked 0.000 0.238 0.157 0.080
r 0.000 0.145 0.145 0.000
partial_r 0.000 0.147 0.145 0.003
full_lm 0.000 0.156 0.147 0.008
full_lm_wo_x 0.000 0.147 0.147 0.000
lasso 0.000 0.150 0.143 0.007
lasso_wo_x 0.000 0.143 0.143 0.000

We see that all methods have an average estimate of 0, as expected given the population value for \(b_x\). We see that the standard deviation of the sampling distribution of the estimate equals the standard error for no covariates, all covariates, and r selection methods. There are small differences between these values for partial r, full lm, and lasso methods. The p-hacking shows a large difference.

Sampling Distributions

Here we view a sampling distribution of the estimate for \(b_x\) for each method.

Warning: Removed 798111 rows containing non-finite outside the scale range
(`stat_density()`).
Figure 8

We see again that each method’s distribution is centered around 0, but the p-hacked method is not normally distributed as it is biasing the parameter estimates. In addition, we see that the no covariates method has more variability in its estimate for \(b_x\) than the other valid selection methods.

From these primary analyses, we see that the p-hacked method leads to inflated Type I error rates and biased parameter estimates. We can conclude that it is not a statistically valid method. For the following analyses, we will not include the p-hacked method. While partial r, full lm, and lasso selection methods showed slight inflation of Type I error, we might be willing to accept this for greater reductions in Type II error, which we will compare in the next section.

Nonzero \(X\) Effect

Next, we look at the nonzero \(X\) effect condition to compare the Type II errors across methods and research settings. Recall that we set two values for \(b_x\) of 0.3 and 0.5.

Type II Error

We will look at the overall Type II error across methods (except p-hacked), then will again compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.

by method

We will first consider the Type II error by the selection method, for both \(b_x = 0.3\) and \(b_x = 0.5\). Here we calculate the proportion of non-significant effects (\(p \geq 0.05\)), the Type II error, displayed below as bar plots.

b_x = 0.3
Figure 9

In this first plot for \(b_x = 0.3\), we see that the Type II error is highest when no covariates are used in the model. There is a large reduction in Type II error when we include all covariates compared to no covariates, and a slight further reduction in Type II error when we use a selection method for covariates compared to including all.

distributions

Here we view the distributions of the Type II error rate by method, for \(b_x = 0.3\).

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 10

Aside from the no covariates approach, we see that the other methods have a similar range of Type II error.

b_x = 0.5
Figure 11

We see a similar trend as above, of a large reduction in Type II error when including all covariates compared to no covariates, and another reduction when selecting covariates, especially when using r, partial r, or lasso.

distributions

Here we view the distributions of the Type II error rate by method, for \(b_x = 0.5\).

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 12

Similarly, we see a similar range and distribution of Type II error for the selection methods, other than no covariates method.

by n_obs

We will view the Type II error rates of each method for the different levels of the number of observations in a sample. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_obs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 6: Type II error by n_obs, b_x = 0.3
n_obs method typeII_min typeII_max typeII_mean
50 no_covs 0.815 0.824 0.819
50 all_covs 0.662 0.860 0.778
50 r 0.628 0.816 0.750
50 partial_r 0.623 0.801 0.731
50 full_lm 0.644 0.801 0.735
50 full_lm_wo_x 0.742 0.831 0.798
50 lasso 0.557 0.800 0.713
50 lasso_wo_x 0.629 0.824 0.761
100 no_covs 0.676 0.685 0.681
100 all_covs 0.296 0.684 0.549
100 r 0.269 0.665 0.528
100 partial_r 0.266 0.657 0.520
100 full_lm 0.337 0.656 0.528
100 full_lm_wo_x 0.427 0.676 0.577
100 lasso 0.256 0.656 0.513
100 lasso_wo_x 0.280 0.670 0.540
150 no_covs 0.549 0.560 0.554
150 all_covs 0.107 0.529 0.378
150 r 0.098 0.525 0.364
150 partial_r 0.097 0.521 0.360
150 full_lm 0.141 0.521 0.366
150 full_lm_wo_x 0.187 0.535 0.393
150 lasso 0.097 0.520 0.356
150 lasso_wo_x 0.098 0.531 0.371
200 no_covs 0.436 0.443 0.440
200 all_covs 0.032 0.406 0.257
200 r 0.029 0.402 0.248
200 partial_r 0.029 0.400 0.245
200 full_lm 0.045 0.399 0.250
200 full_lm_wo_x 0.066 0.406 0.266
200 lasso 0.030 0.399 0.244
200 lasso_wo_x 0.034 0.408 0.253
300 no_covs 0.260 0.268 0.264
300 all_covs 0.003 0.230 0.118
300 r 0.003 0.229 0.114
300 partial_r 0.003 0.227 0.113
300 full_lm 0.004 0.227 0.116
300 full_lm_wo_x 0.005 0.225 0.123
300 lasso 0.003 0.227 0.113
300 lasso_wo_x 0.003 0.228 0.116
400 no_covs 0.148 0.155 0.151
400 all_covs 0.000 0.121 0.053
400 r 0.000 0.119 0.051
400 partial_r 0.000 0.119 0.051
400 full_lm 0.000 0.119 0.053
400 full_lm_wo_x 0.000 0.121 0.056
400 lasso 0.000 0.119 0.051
400 lasso_wo_x 0.000 0.122 0.053

In this table, we see that for all methods, as the sample size increases, the Type II error decreases. This can be better seen in a plot below.

Figure 13

Here, we see the decrease in Type II error for the increase in sample size across all methods. We see that the no covariates approach has the highest Type II error. From both the table and the plot, we see that for smaller sample sizes, including all covariates in the model yields higher Type II errors, but these become comparable for larger sample sizes.

b_x = 0.5
Table 7: Type II error by n_obs, b_x = 0.5
n_obs method typeII_min typeII_max typeII_mean
50 no_covs 0.585 0.595 0.589
50 all_covs 0.274 0.687 0.503
50 r 0.227 0.581 0.446
50 partial_r 0.223 0.558 0.420
50 full_lm 0.318 0.558 0.454
50 full_lm_wo_x 0.423 0.608 0.542
50 lasso 0.171 0.558 0.408
50 lasso_wo_x 0.229 0.598 0.467
100 no_covs 0.296 0.308 0.303
100 all_covs 0.014 0.304 0.171
100 r 0.010 0.276 0.151
100 partial_r 0.010 0.268 0.146
100 full_lm 0.035 0.268 0.158
100 full_lm_wo_x 0.058 0.295 0.189
100 lasso 0.009 0.270 0.144
100 lasso_wo_x 0.012 0.285 0.159
150 no_covs 0.136 0.143 0.140
150 all_covs 0.000 0.119 0.056
150 r 0.000 0.115 0.050
150 partial_r 0.000 0.113 0.049
150 full_lm 0.002 0.113 0.054
150 full_lm_wo_x 0.003 0.121 0.063
150 lasso 0.000 0.114 0.049
150 lasso_wo_x 0.000 0.114 0.053
200 no_covs 0.056 0.062 0.059
200 all_covs 0.000 0.044 0.018
200 r 0.000 0.043 0.016
200 partial_r 0.000 0.042 0.016
200 full_lm 0.000 0.042 0.018
200 full_lm_wo_x 0.000 0.044 0.021
200 lasso 0.000 0.043 0.016
200 lasso_wo_x 0.000 0.044 0.017
300 no_covs 0.008 0.010 0.009
300 all_covs 0.000 0.006 0.002
300 r 0.000 0.005 0.001
300 partial_r 0.000 0.005 0.001
300 full_lm 0.000 0.005 0.002
300 full_lm_wo_x 0.000 0.005 0.002
300 lasso 0.000 0.006 0.001
300 lasso_wo_x 0.000 0.005 0.001
400 no_covs 0.001 0.002 0.001
400 all_covs 0.000 0.001 0.000
400 r 0.000 0.000 0.000
400 partial_r 0.000 0.000 0.000
400 full_lm 0.000 0.000 0.000
400 full_lm_wo_x 0.000 0.001 0.000
400 lasso 0.000 0.000 0.000
400 lasso_wo_x 0.000 0.001 0.000

We see lower overall Type II errors, but the same trend of decreasing errors for increasing sample sizes across all methods. We can view this in the plot below.

Figure 14

Similarly, the no covariates approach has the highest Type II error. For small sample sizes the all covariates approach has higher error which stabilizes with increasing sample size.

by n_covs

We will view the Type II error rates of each method for the different number of available covariates. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 8: Type II error by n_covs, b_x = 0.3
n_covs method typeII_min typeII_max typeII_mean
4 no_covs 0.148 0.821 0.485
4 all_covs 0.015 0.820 0.396
4 r 0.015 0.816 0.393
4 partial_r 0.015 0.801 0.390
4 full_lm 0.015 0.801 0.389
4 full_lm_wo_x 0.017 0.819 0.397
4 lasso 0.015 0.800 0.390
4 lasso_wo_x 0.017 0.820 0.396
8 no_covs 0.150 0.822 0.485
8 all_covs 0.004 0.826 0.361
8 r 0.003 0.816 0.354
8 partial_r 0.003 0.791 0.349
8 full_lm 0.004 0.790 0.349
8 full_lm_wo_x 0.003 0.822 0.368
8 lasso 0.004 0.788 0.348
8 lasso_wo_x 0.003 0.823 0.360
12 no_covs 0.150 0.821 0.485
12 all_covs 0.001 0.832 0.345
12 r 0.001 0.813 0.332
12 partial_r 0.001 0.783 0.326
12 full_lm 0.001 0.782 0.330
12 full_lm_wo_x 0.001 0.824 0.359
12 lasso 0.001 0.772 0.322
12 lasso_wo_x 0.001 0.823 0.340
16 no_covs 0.149 0.823 0.485
16 all_covs 0.000 0.843 0.339
16 r 0.000 0.812 0.320
16 partial_r 0.000 0.783 0.314
16 full_lm 0.000 0.779 0.321
16 full_lm_wo_x 0.000 0.826 0.359
16 lasso 0.000 0.760 0.306
16 lasso_wo_x 0.000 0.823 0.328
20 no_covs 0.149 0.824 0.485
20 all_covs 0.000 0.860 0.337
20 r 0.000 0.816 0.313
20 partial_r 0.000 0.791 0.305
20 full_lm 0.000 0.783 0.317
20 full_lm_wo_x 0.000 0.831 0.361
20 lasso 0.000 0.758 0.293
20 lasso_wo_x 0.000 0.824 0.321

From the average column, we can see that Type II error decreases as the number of covariates increases across all methods, except the no covariates method as this does not dependent on the number of covariates. We can see these trends in the plot below.

Figure 15

In this plot, as the number of covariates increases, we see larger reductions in Type II error when selecting the covariates compared to including all available covariates. Lasso performs the best with higher numbers of covariates, followed by partial r, r, and full lm in order. Again, using no covariates yields the highest Type II error.

b_x = 0.5
Table 9: Type II error by n_covs, b_x = 0.5
n_covs method typeII_min typeII_max typeII_mean
4 no_covs 0.001 0.595 0.184
4 all_covs 0.000 0.589 0.134
4 r 0.000 0.581 0.132
4 partial_r 0.000 0.558 0.128
4 full_lm 0.000 0.558 0.129
4 full_lm_wo_x 0.000 0.587 0.137
4 lasso 0.000 0.558 0.129
4 lasso_wo_x 0.000 0.588 0.134
8 no_covs 0.001 0.593 0.183
8 all_covs 0.000 0.605 0.121
8 r 0.000 0.578 0.114
8 partial_r 0.000 0.539 0.109
8 full_lm 0.000 0.546 0.114
8 full_lm_wo_x 0.000 0.591 0.130
8 lasso 0.000 0.540 0.109
8 lasso_wo_x 0.000 0.588 0.119
12 no_covs 0.001 0.594 0.184
12 all_covs 0.000 0.624 0.119
12 r 0.000 0.569 0.106
12 partial_r 0.000 0.520 0.100
12 full_lm 0.000 0.538 0.110
12 full_lm_wo_x 0.000 0.605 0.134
12 lasso 0.000 0.518 0.099
12 lasso_wo_x 0.000 0.593 0.113
16 no_covs 0.001 0.593 0.184
16 all_covs 0.000 0.656 0.122
16 r 0.000 0.576 0.102
16 partial_r 0.000 0.525 0.096
16 full_lm 0.000 0.538 0.109
16 full_lm_wo_x 0.000 0.605 0.137
16 lasso 0.000 0.513 0.092
16 lasso_wo_x 0.000 0.597 0.109
20 no_covs 0.001 0.591 0.183
20 all_covs 0.000 0.687 0.128
20 r 0.000 0.575 0.100
20 partial_r 0.000 0.536 0.094
20 full_lm 0.000 0.541 0.111
20 full_lm_wo_x 0.000 0.608 0.142
20 lasso 0.000 0.500 0.087
20 lasso_wo_x 0.000 0.598 0.107

We see the same decrease in Type II errors for increases in number of covariates. We can visualize this in the plot below.

Figure 16

Similarly, we see the selection methods performing better than using all covariates. For higher numbers of covariates, we see lasso has the lowest Type II error, followed by partial r, r, and full lm. Using no covariates yields the highest Type II error.

by p_good_covs

We will view the Type II error rates of each method for the different proportions of “good” covariates. In the table below, we see the minimum, maximum, and average Type II error for each level of p_good_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 10: Type II error by p_good_covs, b_x = 0.3
p_good_covs method typeII_min typeII_max typeII_mean
0.25 no_covs 0.148 0.824 0.485
0.25 all_covs 0.007 0.860 0.392
0.25 r 0.006 0.816 0.375
0.25 partial_r 0.006 0.801 0.366
0.25 full_lm 0.006 0.801 0.367
0.25 full_lm_wo_x 0.005 0.831 0.386
0.25 lasso 0.006 0.800 0.363
0.25 lasso_wo_x 0.006 0.824 0.385
0.50 no_covs 0.149 0.823 0.485
0.50 all_covs 0.001 0.852 0.349
0.50 r 0.001 0.814 0.336
0.50 partial_r 0.001 0.795 0.330
0.50 full_lm 0.001 0.794 0.335
0.50 full_lm_wo_x 0.001 0.826 0.363
0.50 lasso 0.001 0.792 0.326
0.50 lasso_wo_x 0.001 0.818 0.343
0.75 no_covs 0.148 0.821 0.485
0.75 all_covs 0.000 0.851 0.325
0.75 r 0.000 0.808 0.317
0.75 partial_r 0.000 0.791 0.314
0.75 full_lm 0.000 0.789 0.322
0.75 full_lm_wo_x 0.000 0.825 0.357
0.75 lasso 0.000 0.788 0.306
0.75 lasso_wo_x 0.000 0.812 0.319

From the average column, we see decreases in Type II error rates for increases in the proportion of good covariates across all methods – except no covariates, as this is independent of the proportion of good covariates. We can visualize the trends more clearly in the plot below.

Figure 17

In addition to the decrease in error mentioned above, we see that including all covariates has a higher Type II error than selection the covariates, especially for lower proportions of good covariates. For larger proportions of good covariates, we see that lasso has the lowest Type II error and that using all covariates and full lm become comparable.

b_x = 0.5
Table 11: Type II error by p_good_covs, b_x = 0.5
p_good_covs method typeII_min typeII_max typeII_mean
0.25 no_covs 0.001 0.593 0.184
0.25 all_covs 0.000 0.687 0.144
0.25 r 0.000 0.581 0.125
0.25 partial_r 0.000 0.558 0.117
0.25 full_lm 0.000 0.558 0.123
0.25 full_lm_wo_x 0.000 0.608 0.139
0.25 lasso 0.000 0.558 0.117
0.25 lasso_wo_x 0.000 0.598 0.133
0.50 no_covs 0.001 0.591 0.184
0.50 all_covs 0.000 0.674 0.121
0.50 r 0.000 0.570 0.107
0.50 partial_r 0.000 0.542 0.102
0.50 full_lm 0.000 0.546 0.112
0.50 full_lm_wo_x 0.000 0.607 0.134
0.50 lasso 0.000 0.542 0.100
0.50 lasso_wo_x 0.000 0.588 0.114
0.75 no_covs 0.001 0.595 0.184
0.75 all_covs 0.000 0.666 0.110
0.75 r 0.000 0.567 0.100
0.75 partial_r 0.000 0.537 0.097
0.75 full_lm 0.000 0.546 0.109
0.75 full_lm_wo_x 0.000 0.598 0.135
0.75 lasso 0.000 0.540 0.092
0.75 lasso_wo_x 0.000 0.569 0.102

From the table, we see decreases in Type II error for higher proportions of good covariates for methods that do include covariates. We can see the details in the plot below.

Figure 18

We again see that including all covariates has a higher Type II error rate than selecting covariates to include, although this method improves for higher proportions of good covariates. Partial r performs best for a lower proportion of good covariates while lasso performs best for a higher proportion. Similarly, full lm and all covariates approaches become comparable as the proportion increases.

by correlations

Recall that in these simulations, the correlation among the good covariates was held constant at 0.3. So, we will look at the Type II error rates of each method by the correlation between \(Y\) and the good covariates, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Table 12: Type II error by correlations, b_x = 0.3
r_ycov method typeII_min typeII_max typeII_mean
0.3 no_covs 0.148 0.824 0.485
0.3 all_covs 0.078 0.860 0.442
0.3 r 0.074 0.816 0.429
0.3 partial_r 0.073 0.801 0.421
0.3 full_lm 0.083 0.801 0.425
0.3 full_lm_wo_x 0.090 0.831 0.450
0.3 lasso 0.072 0.800 0.415
0.3 lasso_wo_x 0.074 0.824 0.434
0.5 no_covs 0.148 0.823 0.485
0.5 all_covs 0.000 0.790 0.270
0.5 r 0.000 0.781 0.255
0.5 partial_r 0.000 0.773 0.252
0.5 full_lm 0.000 0.772 0.258
0.5 full_lm_wo_x 0.000 0.805 0.287
0.5 lasso 0.000 0.769 0.248
0.5 lasso_wo_x 0.000 0.787 0.264

In the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases. The Type II error is highest for including no covariates. We can visualize this below.

Figure 19

In the plots, we see that the no covariates method has the highest Type II error across correlation levels. The Type II errors decrease as the correlation between \(Y\) and the good covariates increases. We also see slight decreases in Type II error from including all covariates to selecting them.

b_x = 0.5
Table 13: Type II error by correlations, b_x = 0.5
r_ycov method typeII_min typeII_max typeII_mean
0.3 no_covs 0.001 0.595 0.184
0.3 all_covs 0.000 0.687 0.170
0.3 r 0.000 0.581 0.154
0.3 partial_r 0.000 0.558 0.146
0.3 full_lm 0.000 0.558 0.153
0.3 full_lm_wo_x 0.000 0.608 0.172
0.3 lasso 0.000 0.558 0.142
0.3 lasso_wo_x 0.000 0.598 0.159
0.5 no_covs 0.001 0.594 0.184
0.5 all_covs 0.000 0.516 0.080
0.5 r 0.000 0.499 0.067
0.5 partial_r 0.000 0.488 0.065
0.5 full_lm 0.000 0.486 0.076
0.5 full_lm_wo_x 0.000 0.563 0.100
0.5 lasso 0.000 0.487 0.064
0.5 lasso_wo_x 0.000 0.516 0.074

From the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases.

Figure 20

In the plot, we see again that the no covariates approach has the highest Type II error across correlations. There is a slight decrease in Type II error when selecting covariates instead of using all covariates. Among the selection methods, full lm has the higher Type II error for a lower \(Y\)-covariate correlation, but the errors become comparable for a higher correlation.

Estimate, SD, & SE

Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.

b_x = 0.3

Table 14: Estimate, SD, SE: b_x = 0.3
method mean_estimate SD_estimate mean_SE difference
no_covs 0.300 0.178 0.178 0.000
all_covs 0.300 0.155 0.155 0.000
r 0.297 0.145 0.145 0.000
partial_r 0.300 0.147 0.145 0.003
full_lm 0.300 0.156 0.147 0.008
full_lm_wo_x 0.287 0.149 0.148 0.001
lasso 0.300 0.150 0.143 0.007
lasso_wo_x 0.292 0.144 0.143 0.000

Here we see that all methods correctly estimate \(b_x\) to be 0.3, except the r approach which yielded a slightly lower average estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.

b_x = 0.5

Table 15: Estimate, SD, SE: b_x = 0.5
method mean_estimate SD_estimate mean_SE difference
no_covs 0.500 0.178 0.178 0.000
all_covs 0.500 0.155 0.155 0.000
r 0.495 0.145 0.145 0.000
partial_r 0.500 0.147 0.145 0.003
full_lm 0.500 0.155 0.147 0.008
full_lm_wo_x 0.479 0.151 0.148 0.002
lasso 0.500 0.150 0.143 0.007
lasso_wo_x 0.487 0.144 0.143 0.001

Similarly, we see all methods correctly estimate \(b_x\) to be 0.5, except the r approach which again yielded a slightly lower estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.

Sampling Distributions

Here we view sampling distributions of the estimate for \(b_x\) for each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3

Warning: Removed 522893 rows containing non-finite outside the scale range
(`stat_density()`).
Figure 21

In the plot, we can see that the distributions for all methods are centered around 0.3. The no covariates approach again has the widest distribution.

b_x = 0.5

Warning: Removed 529644 rows containing non-finite outside the scale range
(`stat_density()`).
Figure 22

In the plot, we can see that the distributions for all methods are centered around 0.5. The no covariates approach again has the widest distribution.

From these secondary analyses, we see that using no covariates leads to the highest Type II error rates across research settings. Using all covariates provides reduction in Type II error (and thus, more power), but using a non-trivial selection method leads to further reductions in Type II error.

Good covariates true positive & false positive rates

Finally, we will look at the true positive rate (TPR) and false positive rate (FPR) for including good covariates by each method to compare how well each method found/included covariates that were part of the data generating process. Ideally, we want a high TPR, indicating that the method had a high proportion of finding/including the good covariates. We want a low FPR, indicating that the method had a low proportion of including the bad covariates. These rates are displayed in the tables below. We include the p-hacked method again.

It also makes sense to consider the TPR & FPR of including good covariates while also considering the actual number of good covariates. We will look at plots of these below as well for \(b_x = 0, 0.3, 0.5\).

True Positive Rate (TPR)

b_x = 0

Table 16: TPR, b_x = 0
method covs_tpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.496
r 0.947
partial_r 0.945
full_lm 0.641
full_lm_wo_x 0.643
lasso 0.888
lasso_wo_x 0.888

On average, we see that r, partial r, and lasso perform the best, while full lm has a considerably lower TPR.

Figure 23

As the number of good covariates increases, we see that full lm has a large decrease in the TPR and lasso has a smaller decrease. The other methods remain relatively stable. We see that the p-hacked method has a relatively flat rate of 50%, indicating that it finds only half of the good covariates.

b_x = 0.3

Table 17: TPR, b_x = 0.3
method covs_tpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.656
r 0.944
partial_r 0.945
full_lm 0.641
full_lm_wo_x 0.635
lasso 0.888
lasso_wo_x 0.884

Similarly, r, partial r, and lasso methods have a higher TPR than full lm.

Figure 24

We see a similar trend in large decreases for full lm and smaller decreases for lasso in the TPR as the number of good covariates increases.

b_x = 0.5

Table 18: TPR, b_x = 0.5
method covs_tpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.747
r 0.940
partial_r 0.945
full_lm 0.641
full_lm_wo_x 0.620
lasso 0.888
lasso_wo_x 0.877

Again we see that r, partial r, and lasso perform the best. In this case with a larger \(X\) effect, we see partial r starting to outperform the r method in terms of TPR.

Figure 25

We see the same decreases in TPR for full lm and lasso as the number of good covariates increases.

False Positive Rate (FPR)

b_x = 0

Table 19: FPR, b_x = 0
method covs_fpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.466
r 0.050
partial_r 0.050
full_lm 0.050
full_lm_wo_x 0.050
lasso 0.313
lasso_wo_x 0.310

For the FPR, we see that r, partial r, and full lm are all comparable, while lasso performs worse than these methods.

Figure 26

We see that as the number of good covariates increases, most methods remain stable in their FPR, except for lasso.

b_x = 0.3

Table 20: FPR, b_x = 0.3
method covs_fpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.415
r 0.050
partial_r 0.050
full_lm 0.050
full_lm_wo_x 0.050
lasso 0.314
lasso_wo_x 0.308

Similarly, r, partial r, and full lm are all comparable, while lasso performs worse than these methods.

Figure 27

The valid selection methods are again comparable, except for lasso which has a changing FPR as the number of good covariates increases.

b_x = 0.5

Table 21: FPR, b_x = 0.5
method covs_fpr_mean
no_covs 0.000
all_covs 1.000
p_hacked 0.375
r 0.050
partial_r 0.050
full_lm 0.050
full_lm_wo_x 0.050
lasso 0.313
lasso_wo_x 0.304

Again we have that r, partial r, and full lm are all comparable, while lasso performs worse and is close to the performance of p-hacking.

Figure 28

The lasso method shows the same trend in FPR here, while the other methods remain stagnant.

Conclusions

We compared 7 methods for selecting covariates to include in linear models. In the first section looking at Type I errors, we demonstrated that the p-hacking approach is not a statistically valid method as it led to inflated Type I error rates and biased parameter estimates (Figure 8). The remaining 6 methods were all shown to be statistically valid, although partial r, lasso, and full lm showed slight inflation of Type I error (Figure 1). The methods can be further compared by their Type II error results. Overall, using no covariates performed the worst as it led to the highest Type II error (and thus, has the lowest power). The no covariates method also showed more variability in the estimate for \(b_x\) (Figure 8, Figure 21, Figure 22). Including all covariates led to reductions in Type II error, and using one of the selection methods led to further reductions in Type II error.

The Type II error of the selection methods can be looked at more closely for different research settings. For smaller sample sizes, we see the benefit of using a selection method over using all covariates (Figure 13, Figure 14). Lasso yielded slightly inflated Type I error, but it performs best for larger numbers of covariates (Figure 15, Figure 16) and for larger proportions of good covariates (Figure 17, Figure 18). This can be further seen when looking at the number of good covariates where we again see lasso outperform the other methods. However, we do see that as the number of good covariates increases, the true positive rate of lasso including these good covariates decreases compared to r and partial r (Figure 23). While the bivariate correlation method shows no inflation of Type I error, it does underestimate \(b_x\) for both \(b_x = 0.3\) (Table 14) and \(b_x = 0.5\) (Table 15). While the partial correlation method does show slight inflation of Type I error, it correctly estimates \(b_x\). For smaller correlations between \(Y\) and the covariates, we again see the benefit of selecting covariates rather than using all, especially for a larger \(X\) effect (Figure 20). Among the valid non-trivial selection methods, we see that full lm leads to the highest Type I error. When considering larger numbers of good covariates, we see the Type II error increasing for full lm, which in some cases, performed even worse than all covariates.

Overall, we can narrow the options for selection methods to bivariate correlation, partial correlation, full lm, and lasso. One of these methods may show more benefits to power depending on the research context. While they each have advantages and disadvantages, researchers should also consider the complexity of these methods when weighing the possible options for selecting covariates.

>>>>>>> 3c5ae889211cc2de1c5f59417e5d6536db846594